Overview

Dataset statistics

Number of variables16
Number of observations105
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.2 KiB
Average record size in memory129.2 B

Variable types

Numeric15
Categorical1

Alerts

Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
CRIM is highly correlated with ZN and 10 other fieldsHigh correlation
ZN is highly correlated with CRIM and 5 other fieldsHigh correlation
INDUS is highly correlated with CRIM and 7 other fieldsHigh correlation
NOX is highly correlated with CRIM and 8 other fieldsHigh correlation
RM is highly correlated with LSTAT and 1 other fieldsHigh correlation
AGE is highly correlated with CRIM and 7 other fieldsHigh correlation
DIS is highly correlated with CRIM and 7 other fieldsHigh correlation
RAD is highly correlated with CRIM and 2 other fieldsHigh correlation
TAX is highly correlated with CRIM and 9 other fieldsHigh correlation
PTRATIO is highly correlated with CRIM and 3 other fieldsHigh correlation
B is highly correlated with CRIM and 1 other fieldsHigh correlation
LSTAT is highly correlated with CRIM and 8 other fieldsHigh correlation
MEDV Predicted by linear model is highly correlated with CRIM and 9 other fieldsHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
CRIM is highly correlated with RAD and 4 other fieldsHigh correlation
ZN is highly correlated with INDUS and 4 other fieldsHigh correlation
INDUS is highly correlated with ZN and 7 other fieldsHigh correlation
NOX is highly correlated with ZN and 6 other fieldsHigh correlation
RM is highly correlated with LSTAT and 1 other fieldsHigh correlation
AGE is highly correlated with ZN and 6 other fieldsHigh correlation
DIS is highly correlated with ZN and 5 other fieldsHigh correlation
RAD is highly correlated with CRIM and 6 other fieldsHigh correlation
TAX is highly correlated with CRIM and 8 other fieldsHigh correlation
PTRATIO is highly correlated with MEDV Predicted by linear modelHigh correlation
B is highly correlated with CRIM and 2 other fieldsHigh correlation
LSTAT is highly correlated with CRIM and 5 other fieldsHigh correlation
MEDV Predicted by linear model is highly correlated with CRIM and 9 other fieldsHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
CRIM is highly correlated with ZN and 4 other fieldsHigh correlation
ZN is highly correlated with CRIM and 3 other fieldsHigh correlation
INDUS is highly correlated with ZN and 4 other fieldsHigh correlation
NOX is highly correlated with CRIM and 5 other fieldsHigh correlation
RM is highly correlated with MEDV Predicted by linear modelHigh correlation
AGE is highly correlated with INDUS and 2 other fieldsHigh correlation
DIS is highly correlated with CRIM and 4 other fieldsHigh correlation
RAD is highly correlated with CRIM and 1 other fieldsHigh correlation
TAX is highly correlated with CRIM and 3 other fieldsHigh correlation
PTRATIO is highly correlated with MEDV Predicted by linear modelHigh correlation
LSTAT is highly correlated with MEDV Predicted by linear modelHigh correlation
MEDV Predicted by linear model is highly correlated with RM and 2 other fieldsHigh correlation
Unnamed: 0 is highly correlated with IDHigh correlation
ID is highly correlated with Unnamed: 0High correlation
CRIM is highly correlated with NOX and 6 other fieldsHigh correlation
ZN is highly correlated with INDUS and 6 other fieldsHigh correlation
INDUS is highly correlated with ZN and 6 other fieldsHigh correlation
NOX is highly correlated with CRIM and 7 other fieldsHigh correlation
RM is highly correlated with CRIM and 4 other fieldsHigh correlation
AGE is highly correlated with ZN and 5 other fieldsHigh correlation
DIS is highly correlated with ZN and 6 other fieldsHigh correlation
RAD is highly correlated with CRIM and 8 other fieldsHigh correlation
TAX is highly correlated with CRIM and 6 other fieldsHigh correlation
PTRATIO is highly correlated with ZN and 7 other fieldsHigh correlation
B is highly correlated with CRIM and 3 other fieldsHigh correlation
LSTAT is highly correlated with CRIM and 4 other fieldsHigh correlation
MEDV Predicted by linear model is highly correlated with CRIM and 5 other fieldsHigh correlation
Unnamed: 0 is uniformly distributed Uniform
ID is uniformly distributed Uniform
Unnamed: 0 has unique values Unique
ID has unique values Unique
CRIM has unique values Unique
MEDV Predicted by linear model has unique values Unique
ZN has 76 (72.4%) zeros Zeros

Reproduction

Analysis started2021-12-08 11:18:50.761766
Analysis finished2021-12-08 11:19:26.123589
Duration35.36 seconds
Software versionpandas-profiling v3.1.1
Download configurationconfig.json

Variables

Unnamed: 0
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct105
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52
Minimum0
Maximum104
Zeros1
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum0
5-th percentile5.2
Q126
median52
Q378
95-th percentile98.8
Maximum104
Range104
Interquartile range (IQR)52

Descriptive statistics

Standard deviation30.45488467
Coefficient of variation (CV)0.585670859
Kurtosis-1.2
Mean52
Median Absolute Deviation (MAD)26
Skewness0
Sum5460
Variance927.5
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1041
 
1.0%
511
 
1.0%
271
 
1.0%
281
 
1.0%
291
 
1.0%
301
 
1.0%
311
 
1.0%
321
 
1.0%
331
 
1.0%
341
 
1.0%
Other values (95)95
90.5%
ValueCountFrequency (%)
01
1.0%
11
1.0%
21
1.0%
31
1.0%
41
1.0%
51
1.0%
61
1.0%
71
1.0%
81
1.0%
91
1.0%
ValueCountFrequency (%)
1041
1.0%
1031
1.0%
1021
1.0%
1011
1.0%
1001
1.0%
991
1.0%
981
1.0%
971
1.0%
961
1.0%
951
1.0%

ID
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM
UNIQUE

Distinct105
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52
Minimum0
Maximum104
Zeros1
Zeros (%)1.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum0
5-th percentile5.2
Q126
median52
Q378
95-th percentile98.8
Maximum104
Range104
Interquartile range (IQR)52

Descriptive statistics

Standard deviation30.45488467
Coefficient of variation (CV)0.585670859
Kurtosis-1.2
Mean52
Median Absolute Deviation (MAD)26
Skewness0
Sum5460
Variance927.5
MonotonicityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1041
 
1.0%
511
 
1.0%
271
 
1.0%
281
 
1.0%
291
 
1.0%
301
 
1.0%
311
 
1.0%
321
 
1.0%
331
 
1.0%
341
 
1.0%
Other values (95)95
90.5%
ValueCountFrequency (%)
01
1.0%
11
1.0%
21
1.0%
31
1.0%
41
1.0%
51
1.0%
61
1.0%
71
1.0%
81
1.0%
91
1.0%
ValueCountFrequency (%)
1041
1.0%
1031
1.0%
1021
1.0%
1011
1.0%
1001
1.0%
991
1.0%
981
1.0%
971
1.0%
961
1.0%
951
1.0%

CRIM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct105
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.100573619
Minimum0.0136
Maximum45.7461
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum0.0136
5-th percentile0.033762
Q10.10084
median0.2909
Q34.26131
95-th percentile13.2672
Maximum45.7461
Range45.7325
Interquartile range (IQR)4.16047

Descriptive statistics

Standard deviation6.099267377
Coefficient of variation (CV)1.967141609
Kurtosis22.65881981
Mean3.100573619
Median Absolute Deviation (MAD)0.24796
Skewness3.954879403
Sum325.56023
Variance37.20106254
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.496321
 
1.0%
3.836841
 
1.0%
12.24721
 
1.0%
0.129321
 
1.0%
0.042941
 
1.0%
0.191861
 
1.0%
0.078861
 
1.0%
0.299161
 
1.0%
0.095121
 
1.0%
14.43831
 
1.0%
Other values (95)95
90.5%
ValueCountFrequency (%)
0.01361
1.0%
0.014321
1.0%
0.021771
1.0%
0.021871
1.0%
0.027291
1.0%
0.033591
1.0%
0.034451
1.0%
0.03511
1.0%
0.040111
1.0%
0.042941
1.0%
ValueCountFrequency (%)
45.74611
1.0%
18.08461
1.0%
17.86671
1.0%
15.57571
1.0%
14.43831
1.0%
13.52221
1.0%
12.24721
1.0%
11.95111
1.0%
11.81231
1.0%
11.57791
1.0%

ZN
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct18
Distinct (%)17.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.8
Minimum0
Maximum100
Zeros76
Zeros (%)72.4%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q312.5
95-th percentile80
Maximum100
Range100
Interquartile range (IQR)12.5

Descriptive statistics

Standard deviation25.38497814
Coefficient of variation (CV)1.983201418
Kurtosis3.167055883
Mean12.8
Median Absolute Deviation (MAD)0
Skewness2.05616692
Sum1344
Variance644.3971154
MonotonicityNot monotonic
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
076
72.4%
803
 
2.9%
12.53
 
2.9%
223
 
2.9%
302
 
1.9%
402
 
1.9%
202
 
1.9%
452
 
1.9%
752
 
1.9%
82.52
 
1.9%
Other values (8)8
 
7.6%
ValueCountFrequency (%)
076
72.4%
12.53
 
2.9%
202
 
1.9%
211
 
1.0%
223
 
2.9%
251
 
1.0%
281
 
1.0%
302
 
1.9%
341
 
1.0%
402
 
1.9%
ValueCountFrequency (%)
1001
 
1.0%
951
 
1.0%
82.52
1.9%
803
2.9%
752
1.9%
601
 
1.0%
52.51
 
1.0%
452
1.9%
402
1.9%
341
 
1.0%

INDUS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct40
Distinct (%)38.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.98409524
Minimum1.32
Maximum27.74
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum1.32
5-th percentile2.722
Q16.09
median9.9
Q318.1
95-th percentile21.89
Maximum27.74
Range26.42
Interquartile range (IQR)12.01

Descriptive statistics

Standard deviation6.854822937
Coefficient of variation (CV)0.5719933629
Kurtosis-1.299680983
Mean11.98409524
Median Absolute Deviation (MAD)6.95
Skewness0.187017998
Sum1258.33
Variance46.98859749
MonotonicityNot monotonic
Histogram with fixed size bins (bins=40)
ValueCountFrequency (%)
18.129
27.6%
19.587
 
6.7%
21.895
 
4.8%
8.564
 
3.8%
8.144
 
3.8%
9.93
 
2.9%
9.693
 
2.9%
6.23
 
2.9%
5.863
 
2.9%
10.013
 
2.9%
Other values (30)41
39.0%
ValueCountFrequency (%)
1.321
1.0%
1.521
1.0%
2.032
1.9%
2.461
1.0%
2.681
1.0%
2.891
1.0%
2.931
1.0%
2.951
1.0%
3.442
1.9%
3.641
1.0%
ValueCountFrequency (%)
27.741
 
1.0%
25.652
 
1.9%
21.895
 
4.8%
19.587
 
6.7%
18.129
27.6%
15.041
 
1.0%
13.921
 
1.0%
12.832
 
1.9%
10.013
 
2.9%
9.93
 
2.9%

CHAS
Categorical

Distinct2
Distinct (%)1.9%
Missing0
Missing (%)0.0%
Memory size968.0 B
0
99 
1
 
6

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters105
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
099
94.3%
16
 
5.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
099
94.3%
16
 
5.7%

Most occurring characters

ValueCountFrequency (%)
099
94.3%
16
 
5.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number105
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
099
94.3%
16
 
5.7%

Most occurring scripts

ValueCountFrequency (%)
Common105
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
099
94.3%
16
 
5.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII105
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
099
94.3%
16
 
5.7%

NOX
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct48
Distinct (%)45.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5620104762
Minimum0.392
Maximum0.871
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum0.392
5-th percentile0.411
Q10.449
median0.544
Q30.624
95-th percentile0.77
Maximum0.871
Range0.479
Interquartile range (IQR)0.175

Descriptive statistics

Standard deviation0.1181585391
Coefficient of variation (CV)0.2102425918
Kurtosis-0.2736210878
Mean0.5620104762
Median Absolute Deviation (MAD)0.087
Skewness0.6120962056
Sum59.0111
Variance0.01396144037
MonotonicityNot monotonic
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
0.775
 
4.8%
0.6245
 
4.8%
0.4375
 
4.8%
0.7134
 
3.8%
0.5844
 
3.8%
0.5384
 
3.8%
0.524
 
3.8%
0.6054
 
3.8%
0.5443
 
2.9%
0.4283
 
2.9%
Other values (38)64
61.0%
ValueCountFrequency (%)
0.3921
 
1.0%
0.4011
 
1.0%
0.4041
 
1.0%
0.4051
 
1.0%
0.411
 
1.0%
0.4112
1.9%
0.4152
1.9%
0.41611
 
1.0%
0.4283
2.9%
0.4313
2.9%
ValueCountFrequency (%)
0.8713
2.9%
0.775
4.8%
0.741
 
1.0%
0.7182
 
1.9%
0.7134
3.8%
0.72
 
1.9%
0.6933
2.9%
0.6792
 
1.9%
0.6711
 
1.0%
0.6591
 
1.0%

RM
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct99
Distinct (%)94.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.253180952
Minimum3.561
Maximum7.929
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum3.561
5-th percentile5.3676
Q15.949
median6.195
Q36.631
95-th percentile7.3438
Maximum7.929
Range4.368
Interquartile range (IQR)0.682

Descriptive statistics

Standard deviation0.6793687225
Coefficient of variation (CV)0.1086437011
Kurtosis3.199066059
Mean6.253180952
Median Absolute Deviation (MAD)0.316
Skewness-0.6915677816
Sum656.584
Variance0.4615418612
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.4312
 
1.9%
5.9262
 
1.9%
6.3152
 
1.9%
6.2512
 
1.9%
6.1082
 
1.9%
6.1932
 
1.9%
6.5111
 
1.0%
7.1481
 
1.0%
5.7131
 
1.0%
6.021
 
1.0%
Other values (89)89
84.8%
ValueCountFrequency (%)
3.5611
1.0%
3.8631
1.0%
4.5191
1.0%
5.0361
1.0%
5.0931
1.0%
5.3621
1.0%
5.391
1.0%
5.4041
1.0%
5.5651
1.0%
5.6081
1.0%
ValueCountFrequency (%)
7.9291
1.0%
7.8531
1.0%
7.611
1.0%
7.4891
1.0%
7.471
1.0%
7.3581
1.0%
7.2871
1.0%
7.2671
1.0%
7.1851
1.0%
7.1781
1.0%

AGE
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct89
Distinct (%)84.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean70.46952381
Minimum6.8
Maximum100
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum6.8
5-th percentile17.54
Q147.2
median78.1
Q394.5
95-th percentile100
Maximum100
Range93.2
Interquartile range (IQR)47.3

Descriptive statistics

Standard deviation27.41012279
Coefficient of variation (CV)0.3889642119
Kurtosis-0.7720340122
Mean70.46952381
Median Absolute Deviation (MAD)18.1
Skewness-0.7076608091
Sum7399.3
Variance751.3148315
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10010
 
9.5%
972
 
1.9%
76.52
 
1.9%
87.92
 
1.9%
94.32
 
1.9%
96.12
 
1.9%
96.22
 
1.9%
962
 
1.9%
61.11
 
1.0%
91.21
 
1.0%
Other values (79)79
75.2%
ValueCountFrequency (%)
6.81
1.0%
9.91
1.0%
14.71
1.0%
15.71
1.0%
15.81
1.0%
17.51
1.0%
17.71
1.0%
21.11
1.0%
21.51
1.0%
26.31
1.0%
ValueCountFrequency (%)
10010
9.5%
98.81
 
1.0%
981
 
1.0%
97.71
 
1.0%
97.31
 
1.0%
972
 
1.9%
96.81
 
1.0%
96.61
 
1.0%
96.22
 
1.9%
96.12
 
1.9%

DIS
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct100
Distinct (%)95.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.699208571
Minimum1.2852
Maximum9.2203
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum1.2852
5-th percentile1.59566
Q12.0635
median2.7831
Q35.1167
95-th percentile7.3192
Maximum9.2203
Range7.9351
Interquartile range (IQR)3.0532

Descriptive statistics

Standard deviation2.017964376
Coefficient of variation (CV)0.5455124623
Kurtosis-0.2737210523
Mean3.699208571
Median Absolute Deviation (MAD)1.1249
Skewness0.8755155179
Sum388.4169
Variance4.072180224
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6.47982
 
1.9%
6.272
 
1.9%
2.38172
 
1.9%
4.1482
 
1.9%
5.41592
 
1.9%
7.31721
 
1.0%
6.34671
 
1.0%
1.59161
 
1.0%
1.38611
 
1.0%
2.50911
 
1.0%
Other values (90)90
85.7%
ValueCountFrequency (%)
1.28521
1.0%
1.35671
1.0%
1.38611
1.0%
1.46551
1.0%
1.51061
1.0%
1.59161
1.0%
1.61191
1.0%
1.61321
1.0%
1.6181
1.0%
1.6391
1.0%
ValueCountFrequency (%)
9.22031
1.0%
8.90671
1.0%
8.32481
1.0%
8.05551
1.0%
7.82651
1.0%
7.31971
1.0%
7.31721
1.0%
7.3091
1.0%
7.03551
1.0%
6.81851
1.0%

RAD
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9.961904762
Minimum1
Maximum24
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum1
5-th percentile2
Q14
median5
Q324
95-th percentile24
Maximum24
Range23
Interquartile range (IQR)20

Descriptive statistics

Standard deviation8.814240891
Coefficient of variation (CV)0.8847947357
Kurtosis-1.026162408
Mean9.961904762
Median Absolute Deviation (MAD)1
Skewness0.9387388343
Sum1046
Variance77.69084249
MonotonicityNot monotonic
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
2429
27.6%
525
23.8%
420
19.0%
69
 
8.6%
27
 
6.7%
35
 
4.8%
84
 
3.8%
74
 
3.8%
12
 
1.9%
ValueCountFrequency (%)
12
 
1.9%
27
 
6.7%
35
 
4.8%
420
19.0%
525
23.8%
69
 
8.6%
74
 
3.8%
84
 
3.8%
2429
27.6%
ValueCountFrequency (%)
2429
27.6%
84
 
3.8%
74
 
3.8%
69
 
8.6%
525
23.8%
420
19.0%
35
 
4.8%
27
 
6.7%
12
 
1.9%

TAX
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct37
Distinct (%)35.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean426.2190476
Minimum188
Maximum711
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum188
5-th percentile242.2
Q1300
median391
Q3666
95-th percentile666
Maximum711
Range523
Interquartile range (IQR)366

Descriptive statistics

Standard deviation164.2400905
Coefficient of variation (CV)0.3853419771
Kurtosis-1.237953932
Mean426.2190476
Median Absolute Deviation (MAD)98
Skewness0.5589900046
Sum44753
Variance26974.80733
MonotonicityNot monotonic
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
66629
27.6%
4037
 
6.7%
3077
 
6.7%
4375
 
4.8%
3984
 
3.8%
3844
 
3.8%
2873
 
2.9%
3113
 
2.9%
3913
 
2.9%
3303
 
2.9%
Other values (27)37
35.2%
ValueCountFrequency (%)
1882
1.9%
1931
1.0%
2231
1.0%
2241
1.0%
2421
1.0%
2431
1.0%
2451
1.0%
2471
1.0%
2521
1.0%
2542
1.9%
ValueCountFrequency (%)
7111
 
1.0%
66629
27.6%
4691
 
1.0%
4375
 
4.8%
4323
 
2.9%
4037
 
6.7%
3984
 
3.8%
3913
 
2.9%
3844
 
3.8%
3482
 
1.9%

PTRATIO
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct31
Distinct (%)29.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean18.47238095
Minimum12.6
Maximum21.2
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum12.6
5-th percentile14.7
Q116.6
median19.1
Q320.2
95-th percentile21.08
Maximum21.2
Range8.6
Interquartile range (IQR)3.6

Descriptive statistics

Standard deviation2.227040777
Coefficient of variation (CV)0.12056057
Kurtosis-0.626267422
Mean18.47238095
Median Absolute Deviation (MAD)1.1
Skewness-0.7257477814
Sum1939.6
Variance4.959710623
MonotonicityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
20.229
27.6%
14.710
 
9.5%
19.15
 
4.8%
16.65
 
4.8%
15.25
 
4.8%
21.25
 
4.8%
17.85
 
4.8%
214
 
3.8%
20.94
 
3.8%
19.24
 
3.8%
Other values (21)29
27.6%
ValueCountFrequency (%)
12.61
 
1.0%
131
 
1.0%
14.710
9.5%
15.11
 
1.0%
15.25
4.8%
15.61
 
1.0%
161
 
1.0%
16.11
 
1.0%
16.41
 
1.0%
16.65
4.8%
ValueCountFrequency (%)
21.25
 
4.8%
21.11
 
1.0%
214
 
3.8%
20.94
 
3.8%
20.229
27.6%
20.11
 
1.0%
19.71
 
1.0%
19.63
 
2.9%
19.24
 
3.8%
19.15
 
4.8%

B
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct83
Distinct (%)79.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean345.6953333
Minimum3.65
Maximum396.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum3.65
5-th percentile48.944
Q1373.66
median390.74
Q3395.62
95-th percentile396.9
Maximum396.9
Range393.25
Interquartile range (IQR)21.96

Descriptive statistics

Standard deviation106.9761559
Coefficient of variation (CV)0.3094521262
Kurtosis4.09795367
Mean345.6953333
Median Absolute Deviation (MAD)6.16
Skewness-2.349808247
Sum36298.01
Variance11443.89793
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
396.922
 
21.0%
393.742
 
1.9%
388.081
 
1.0%
394.621
 
1.0%
350.651
 
1.0%
393.231
 
1.0%
390.741
 
1.0%
272.211
 
1.0%
179.361
 
1.0%
395.381
 
1.0%
Other values (73)73
69.5%
ValueCountFrequency (%)
3.651
1.0%
6.681
1.0%
10.481
1.0%
24.651
1.0%
27.251
1.0%
48.451
1.0%
50.921
1.0%
81.331
1.0%
88.271
1.0%
96.731
1.0%
ValueCountFrequency (%)
396.922
21.0%
396.421
 
1.0%
395.991
 
1.0%
395.691
 
1.0%
395.671
 
1.0%
395.621
 
1.0%
395.61
 
1.0%
395.591
 
1.0%
395.431
 
1.0%
395.381
 
1.0%

LSTAT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct103
Distinct (%)98.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.6672381
Minimum1.73
Maximum36.98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size968.0 B

Quantile statistics

Minimum1.73
5-th percentile3.234
Q17.12
median12.26
Q317.16
95-th percentile23.876
Maximum36.98
Range35.25
Interquartile range (IQR)10.04

Descriptive statistics

Standard deviation6.912011075
Coefficient of variation (CV)0.5456604686
Kurtosis0.7102127849
Mean12.6672381
Median Absolute Deviation (MAD)4.93
Skewness0.7128124328
Sum1330.06
Variance47.77589711
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
132
 
1.9%
18.132
 
1.9%
12.261
 
1.0%
4.031
 
1.0%
17.111
 
1.0%
12.141
 
1.0%
21.781
 
1.0%
11.341
 
1.0%
10.191
 
1.0%
21.141
 
1.0%
Other values (93)93
88.6%
ValueCountFrequency (%)
1.731
1.0%
1.981
1.0%
2.871
1.0%
2.981
1.0%
3.111
1.0%
3.161
1.0%
3.531
1.0%
3.561
1.0%
3.71
1.0%
3.731
1.0%
ValueCountFrequency (%)
36.981
1.0%
29.931
1.0%
29.681
1.0%
29.051
1.0%
25.681
1.0%
24.161
1.0%
22.741
1.0%
21.781
1.0%
21.521
1.0%
21.461
1.0%

MEDV Predicted by linear model
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct105
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean22.1367882
Minimum-3.779295478
Maximum42.17134297
Zeros0
Zeros (%)0.0%
Negative1
Negative (%)1.0%
Memory size968.0 B

Quantile statistics

Minimum-3.779295478
5-th percentile11.16565306
Q117.46411745
median20.61844153
Q326.33994554
95-th percentile36.00535697
Maximum42.17134297
Range45.95063845
Interquartile range (IQR)8.875828091

Descriptive statistics

Standard deviation7.749044453
Coefficient of variation (CV)0.3500527891
Kurtosis0.7676617979
Mean22.1367882
Median Absolute Deviation (MAD)4.255160568
Skewness0.03745379886
Sum2324.362761
Variance60.04768993
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25.847901551
 
1.0%
31.987179461
 
1.0%
16.96300691
 
1.0%
36.132086571
 
1.0%
24.757693891
 
1.0%
24.830575751
 
1.0%
11.986824241
 
1.0%
29.887189041
 
1.0%
19.969791661
 
1.0%
16.540786061
 
1.0%
Other values (95)95
90.5%
ValueCountFrequency (%)
-3.7792954781
1.0%
3.8407123321
1.0%
6.4700914951
1.0%
9.1364762521
1.0%
9.6302348661
1.0%
10.960360261
1.0%
11.986824241
1.0%
13.033585211
1.0%
13.464009641
1.0%
13.775021461
1.0%
ValueCountFrequency (%)
42.171342971
1.0%
39.39072911
1.0%
37.485060511
1.0%
37.343605061
1.0%
36.198869591
1.0%
36.132086571
1.0%
35.498438551
1.0%
33.977809621
1.0%
33.636716831
1.0%
33.299411521
1.0%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

Unnamed: 0IDCRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV Predicted by linear model
0000.1061230.04.9300.4286.09565.16.33616300.016.6394.6212.4023.765788
1110.341090.07.3800.4936.41540.14.72115287.019.6396.906.1225.113261
22212.247200.018.1000.5845.83759.71.997624666.020.224.6515.6915.056591
3330.2248912.57.8700.5246.37794.36.34675311.015.2392.5220.4520.631449
4441.800280.019.5800.6055.87779.22.42595403.014.7227.6112.1421.755657
5550.051880.04.4900.4496.01545.14.42723247.018.5395.9912.8622.559088
6660.325430.021.8900.6246.43198.81.81254437.021.2396.9015.3919.214084
7770.2991620.06.9600.4645.85642.14.42903223.018.6388.6513.0022.183464
8880.0335975.02.9500.4287.02415.85.40113252.018.3395.621.9833.636717
99911.951100.018.1000.6595.608100.01.285224666.020.2332.0912.1318.179175

Last rows

Unnamed: 0IDCRIMZNINDUSCHASNOXRMAGEDISRADTAXPTRATIOBLSTATMEDV Predicted by linear model
9595950.115040.02.8900.4456.16369.63.49522276.018.0391.8311.3424.757694
9696960.0401180.01.5200.4047.28734.17.30902329.012.6396.904.0836.132087
9797970.207460.027.7400.6095.09398.01.82264711.020.1318.4329.683.840712
9898980.290900.021.8900.6246.17493.61.61194437.021.2388.0824.1614.785727
9999990.068880.02.4600.4886.14462.22.59793193.017.8396.909.4527.416104
1001001000.1907322.05.8600.4316.71817.57.82657330.019.1393.746.5624.873602
1011011016.962150.018.1000.7005.71397.01.926524666.020.2394.4317.1116.540786
1021021020.0536021.05.6400.4396.51121.16.81474243.016.8396.905.2827.767235
1031031030.1046940.06.4110.4477.26749.04.78724254.017.6389.256.0535.498439
1041041044.555870.018.1000.7183.56187.91.613224666.020.2354.707.129.136476